Goto

Collaborating Authors

 uncertainty-aware self-reflection


SelectIT: Selective Instruction Tuning for LLMs via Uncertainty-Aware Self-Reflection

Neural Information Processing Systems

Instruction tuning (IT) is crucial to tailoring large language models (LLMs) towards human-centric interactions. Recent advancements have shown that the careful selection of a small, high-quality subset of IT data can significantly enhance the performance of LLMs. Despite this, common approaches often rely on additional models or data, which increases costs and limits widespread adoption. In this work, we propose a novel approach, termed \textit{SelectIT}, that capitalizes on the foundational capabilities of the LLM itself. Specifically, we exploit the intrinsic uncertainty present in LLMs to more effectively select high-quality IT data, without the need for extra resources. Furthermore, we introduce a curated IT dataset, the \textit{Selective Alpaca}, created by applying SelectIT to the Alpaca-GPT4 dataset.